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Abstract 

We study the asymptotics in for complexity penalized least squares regression 
for the discrete approximation of finite-dimensional signals on continuous domains - 
e.g. images - by piecewise smooth functions. 

We introduce a fairly general setting which comprises most of the presently popular 
partitions of signal- or image- domains like interval-, wedgelet- or related partitions, 
as well as Delaunay triangulations. Then we prove consistency and derive convergence 
rates. Finally, we illustrate by way of relevant examples that the abstract results are 
useful for many applications. 

1 Introduction 

We are going to study consistency of special complexity penalized Least Squares estimators 
for noisy observations of finite-dimensional signals on multi-dimensional domains, in par- 
ticular of images. The estimators discussed in the present paper are based on partitioning 
combined with piecewise smooth approximation. In this framework, consistency is proved 
and convergence rates are derived in . Finally, the abstract results are applied to a couple 
of relevant examples, including popular methods like interval-, wedgelet- or related parti- 
tions, as well as Delaunay triangulations. Fig. [l] illustrates a typical wedgelet representation 
of a noisy image. 

Consistency is a strong indication that an estimation procedure is meaningful. More- 
over, it allows for structural insight since a sequence of discrete estimation procedures is 
embedded into a common continuous setting and the quantitative behaviour of estimators 
can be compared. It is frequently used as a substitute or approximation for missing or 
vague knowledge in the real finite sample situation. Plainly, one must be aware of various 
shortcomings and should not rely on asymptotics in case of small sample size. Neverthe- 
less, consistency is a broadly accepted justification of statistical methods. Convergence 
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Figure 1: A noisy image (left) and (right) a fairly rough wedgelet representation for n = 256. 
The (middle) picture also shows the boundaries of the smoothness regions. 

rates are of particular importance, since they indicate the quality of discrete estimates or 
approximations and allow for comparison of different methods. 

Observations or data will be governed by a simple regression model with additive white 
noise: Let 5*" = {1, ... , n}'^ be a finite discrete signal domain, interpreted as the discretiza- 
tion of the continuous domain S°° = [0, 1)''. Data y = {ys)seS" are available for the discrete 
domains at all levels n and generated by the model 

n" = /r + ff, neN, seS", (1) 

where (/")sgS" is a discretisation of an original or 'true' signal / on S°° and (^")seS" is 
white (sub-)Gaussian noise. 

The present approach is based on a partitioning of the discrete signal domain into re- 
gions on each of which a smooth approximation of noisy data is performed. The choice of a 
particular partition is obtained by a complexity penalized least squares estimation, depen- 
dent on the data. Between the regions, sharp breaks of intensity may happen, which allows 
for edge-preserving piecewise smoothing. In one dimension, a natural way to model jumps 
in signals is to consider piecewise regular functions. This leads naturally to representations 
based on partitions consisting of intervals. The number of intervals on a discrete line of 
length n is of polynomial order n? . 

In more dimensions, however, the definition of elementary fragments is much more 
involved. For example, in a discrete square of side-length n, the number of all subregions 
is of the exponential order 2" . When dealing with images, one of the difficulties consists 
in constructing reduced sets of fragments which, at the same time, take into account the 
geometry of images and lead to computationally feasible algorithms for the computation of 
estimators. 

The estimators adopted here are minimal points of complexity penalized least squares 
functionals: ii y — iys)seS" is a sample and x = {xs)s£S" a tentative representation of y, 
the functional 

H"ix,y)^j\n^)\+Y,{ys-Xs)' (2) 

has to be minimised in x given y; the penalty "f\^{x)\ is the number of subdomains into 
which the entire domain is divided and on which x is smooth in a sense to be made precise by 
the choice of suitable function spaces (see Sections [2T| and [s]) ; 7 is a tuning parameter. This 
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automatically results in a sparse representation of the function. Due to the non-convexity 
of this penalty one has to solve hard optimisation problems. 

These are not computationally feasible, if all possible partitions of the signal domain are 
admitted. A most popular attempt to circumvent this nuisance is simulated annealing, see 
for instance the seminal paper S. Geman and D. Geman (1984). This paper had a consid- 
erable impact on imaging; the authors transferred models from statistical physics to image 
analysis as prior distributions in the framework of Bayesian statistics. This approach was 
intimately connected with Markov Chain Monte Carlo Methods like Metropolis Sampling 
and Simulated Annealing, cf. G. Winkler (2003). 

On the other hand, transferring spatial complexity to time complexity like in such meta- 
heuristics, does not remove the basic problem; it rather transforms it. Such algorithms 
are not guaranteed to find the optimum or even a satisfactory near-optimal solution, cf. 
G. Winkler (2003), Section 6.2. All metaheuristics will eventually encounter problems 



on which they perform poorly. Moreover, if the number of partitions grows at least expo- 
nentially, it is difficult to derive useful uniform bounds on the projections of noise onto the 
subspaces induced by the partitions. Reducing the search space drastically allows to design 
exact and fast algorithms. Such a reduction basically amounts to restrictions on admissible 
partitions of the signal domain. There are various suggestions, some of them mentioned 
initially. 

In one dimension, regression onto piecewise constant functions was proposed by the 
legendary J.W. Tukey (1961) who called respective representations regressograms. The 
functional ([2|) is by some (including the authors) referred to as the Potts functional. It 
was introduced in R.B. Potts (1952) as a generalization of the well-known Ising model, 
E. IsiNG (1925), from statistical physics from two to more spins. It was suggested by 



W. Lenz ( 1920 ) and penalizes the length of contours between regions of constant spins. 



In fact, in one dimension a partition into say k intervals on which the signal is constant 
admits k — 1 jumps and therefore has contour- length fc — 1. 

The one-dimensional Potts model for signals was studied in detail in a series of theses and 
articles, see|G. Winkler and V. Liebscher| ( 2002[ );|G. Winkler et al.|(|2004[);|V. Lieb- 



SCHER and G. Winkler (1999); A. Kempe (2004); 



et al. (2005); O. Wittich et al. 



F. Friedrich (2005); G. Winkler 



(2008); F. Friedrich et al. (2008). Consistency was 



first adressed in A. Kempe (2004) and later on exhaustively treated in L. BoYSEN et al. 



(2009) and L. Boysen et al. (2007). Partitions consist there of intervals. Our study of the 



multi-dimensional case started with the thesis F. Friedrich (2005), see also F. Friedrich 



et aL| ( |2007[ ). 

In two or more dimensions, the model ([2| differs substantially from the classical Potts 
model. The latter penalizes the length of contours - locations of intensity breaks - whereas 
([2| penalizes the number of regions. This allows for instance to perform well on filamentous 
structures, albeit they have long borders compared to their area. 

Let us give an informal introduction into the setting. The aim is to estimate a function 
/ on the d-dimensional unit cube S°° = [0, l)'* from discrete data. To this end, S°° and 
/ are discretized to cubic grids — {I, . . . n E N, and functions /" on 5". On 

each stage n, data y", s € 5", is available, i.e. noisy observations of the We wi ll prove 
L^-convergence of complexity penalized least squares estimators /"(?/) (Section 2.2) for the 
/" (Section 2.1 1 to / and derive convergence rates, first in the general setting. We are 
faced with three kinds of error: the error caused by noise, the approximation and the often 
ignored error. Noise is essentially controlled regardless of the specific form of /. For the 
approximation and the discretisation error special assumptions on the function classes in 
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question are needed. 

Because of the approximation error term, there are deep connections to approximation 
theory. In particular, when deahng with piecewise regular images, non linear approximation 
rates obtained by wavelet shrinkage methods are known to be suboptimal, as discussed 
in |R. KoROSTELEV and TsYBAKOVI (|1993|) or ID. DonohoI (|1999|). In the last decade, 



the challenging problem to improve upon wavelets has been addressed in very different 
directions. 

The search for a good paradigm for detecting and representing curvilinear discontinuities 
of bivariate functions remains a fundamental issue in image analysis. Ideally, an efficient 
representation should use atomic decompositions which are local in space (like wavelets), 
but also possess appropriate directional properties (unlike wavelets). One of the most 
prominent examples is given by curvelet representations, which are based on multiscale 



directional filtering combined with anisotropic scaling. E. C ANDES and D. DoNOHO (2002) 



proved that thresholding of curvelet coefficients provides estimators which yield the minimax 
convergence rate up to a logarithmic factor for piecewise functions with boundaries. 
Another interesting representation is given by bandelets as proposed in |E. Le Pennec and| 
S. Mallat (2005). Bandelets are based on optimal local warping in the image domain 



relatively to the geometrical flow and C. DosSAL et al. (2011 ) proved also optimality of the 



minimax convergence rates of their bandelet-based estimator, for a larger class of functions 
including piecewise "^"^ functions with boundaries. 

The bidimensional examples discussed in Section [5] are based on more geometrical con- 
structions, to which the abstract framework proposed in Se ction [4| applies. 

Wedgelet partitions were introduced by D. DoNOHO (1999) and belong to the class 



of shape-preserving image segmentation methods. The decompositions are based on local 
polynomial approximation on some adaptively selected leaves of a quadtree structure. The 
use of a suitable data structure allowed for the development of fast algorithms for wedgelet 



decomposition, see F. Friedrich et al. (2007). 



An alternative is provided by anisotropic Delaunay triangulations, which have been 
proposed in the context of image compression in L. Demaret et al. (2006). The flexible 



design of the representing system allows for a particularly flue selection of triangles fltting 
the anisotropic geometrical features of images. In contrast to curvelets, such representations 
preserve the advantage of wavelets and are still able to approximate point singularities 



optimally, see L. Demaret and A. Iske (2012) 



Both wedgelet representations and anisotropic Delaunay triangulations lead to optimal 
non linear approximation rates for some classes of piecewise smooth functions. In the 
present paper, we prove optimality also for the convergence rates of the estimators. More 
precisely, we prove strong consistency rates of 

0(e2"/("+i)log(£„)),£„ = aVn', 

where is the variance of noise and a is a parameter controlling piecewise regularity. Such 
rates are known to be optimal up to the logarithmic factor. 



L. BiRGE and P. Massart (2007) showed recently that, in a similar setting, optimal 
rates without the log factor may be achieved with penalties different from those in ([2]) , and 
not merely proportional to the number of pieces. In the present work, we explicitly restrict 
our attention to the classical penalty given by the number of pieces as in ([2]), noting that 
this corresponds to an ansatz which is currently popular in the signal community. One of 
the main reasons is the connection to sparsity. The generalization of the proofs in this paper 
is straightforward but would be rather technical an thus might obscure the main ideas. 
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We address first noise and its projections to tlie approximation spaces, see Section 
In Section [4j we derive convergence rates in the general context. Finally, in Section 5 
we illustrate the abstract results by specific applications. Dimension 1 is included, thus 



generalising the results from L. BoYSEN et al. (2009) to piecewise polynomial regression 



and piecewise Sobolev classes. Our two-dimensional examples, wedgelets and Delaunay 
triangulations, both rely on a geometric and edge-preserving representation. Our main 
motivation are the optimal approximation properties of these methods, the key feature to 
apply the previous framework being an appropriate discretization of these schemes. 



2 The Setting 

In this section we introduce the formal framework for piecewise smooth representations, the 
regression model for data, and the estimation procedure. 

2.1 Regression and Segmentations 

Image domains will generically be denoted by S. We choose S°° = [0, 1)'^, d G N, as the 
continuous and S"" = {1, . . . ,n}'^ as the generic discrete image domain. Let / g L'^{S°°) 
represent the 'true' image which has to be reconstructed from noisy discrete data. For the 
latter, we adopt a simple linear regression model of the form 

n" = /r +e, neN, seS". (3) 

The noise variables ^" in the regression model are random variables on a common probability 
space (J7, ^,P). /" = (./"s)seS'" is a discretisation of /. To be definite, divide into n'^ 
semi-open cubes 

In.....;, = n K'J - l)/"'*:/-/^), 1 < < 

l<j<d 

of volume and for g £ L'^{S°°) take local means 

This specifies maps (5" from L'^{S°°) to R^" by 

S''9={-97)seS'^. (4) 
Conversely, embeddings of M'^" into L'^{S°°) are defined by 

As an aid to memory, keep the following chain of maps in mind: 

L'^{S°°) A M^" A L\S°°). 

In absence of noise, / is approximated by the functions i"/" = iP-S"^ f in any precision. 
The main task thus will be to control noise. In fact, the function L^S"f = t"/" is the 
conditional expectation of / w.r.t. the ((T)-algebra generated by the cubes /" and 
convergence can be seen by a martingale argument. 
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We are dealing with estimates of / or rather of /" on each level n. An image domain 
iS* will be partitioned by the method into sets, on which the future representations are 
members of initially chosen spaces of smooth functions. To keep control, we choose a class 
^ G 2^ of admissible fragments and later on, these will be rectangles, wedges or triangles. 
A subset ^ C 2'^ is a partition if (a) the elements in 3^ are mutually disjoint, and (b) S is 
the union of all P e We will only consider partitions ^ C In addition, we choose a 
subset Cp of all partitions and call its elements admissible partitions. 

For each fragment P G we choose a finite dimensional linear space of real 
functions on S which vanish off P. Examples are spaces of constant functions or polynomials 
of higher degree. This space is determined by the maximal local smoothness of /. If G *p 
and fge = {fp)pe^ is a family of such functions, we also denote by the function defined 
on all of S and whose restriction to P is equal to fp for each P £ 3^. The pair fge) is 
a segmentation and each element (P, fp) is a segment. 

For each partition ^ , define the linear space — spanj^p : P e ^^}. A family of 
segmentations is called a segmentation class. In particular, let 

eep,?) ■.^{{^,f):VeVJ e^.^} 

with partitions in *p and functions in 5^ = {^,^ : ^ € 



2.2 Complexity Penalized Least Squares Estimation 

We want to produce appropriate discrete representations or estimates of the underlying 
function / on the basis of random data Y from the regression model ([s]). We are watching 
out for a segmentation which is in proper balance between fidelity to data and complexity. 

We decide in advance on a class & of (admissible) segmentations which should contain 
the desired representations. The segmentations, given data Y", are scored by the functional 

i/^ : 6" X M^" M, ((^, /,^), r") =71^1 + ||/^ - Y^^ (6) 

with 7 > and |^| the cardinality of The symbol || • || denotes the ^^-norm on . 
The last term measures fidelity to data. The other term is a rough measure of overall 
smoothness. As estimators for / given data Y we choose minimisers (^?^", /") of Note 
that both and /" are random since Y" is random. 

The definition makes sense since minimal points of ^ do always exist. This can easily 
verified by the reduction principle, which relies on the decomposition 



min = min 7|^I + min \\f.^~Y\ 

Given the inner minimisation problem has as unique solution the orthogonal projection 
of Y to c^,32. The outer minimisation problem is finite and hence a minimum of ^ 
exists. Let us pick one of the minimal points 



3 Noise and its Projections 

For consistency, resolutions at infinitely many levels are considered simultaneously. Fre- 
quently, segmentations are not defined for all n g N but only for a cofinal subset of N. 
Typical examples are all dyadic partitions like quad-trees or dyadic wedgelet segmentations 
where only indices of the form n = 2^ appear. Therefore we adopt the following convention: 
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The symbol M denotes any infinite subset of N endowed with the natural order < . 

(M, <) is a totally ordered set and we may consider nets (x„)„gM- For example 
n £ M, means that x„ convergences to x along M. We deal similarly with notions like 
limsup etc. Plainly, we might resort to subsequences instead but this would cause a change 
of indices which is notationally inconvenient. 



3.1 Sub-Gaussian Noise and a Tail Estimate 

We introduce now the main hypotheses on noise accompanied by a brief discussion. The 
core of the arguments in later sections is the tail estimate ([s]) below. 

As Theorem [2] will show, the appropriate framework are sub- Gaussian random variables. 
A random variable ^ enjoys this property if one of the following conditions is fulfilled: 

Theorem 1 The following two conditions on a random variable ^ are equivalent: 

(a) There is a G M such that 

E(exp (t^)) < exp{a'^t^/2) for t > (7) 

(b) ^ is centred and majorized in distribution by some centred Gaussian variable rj, i.e. 

there is cq > such that P(|^| > c) < P(|r/| > c) for all c > cq. 
This and most other facts about sub-Gaussian variables quoted in this paper are verified in 



one may also consult V.V. Petrov (1975), Section III.§4 



the first few sections of the monograph V.V. BULDYGIN and Yu.V. Kozachenko (20001; 



The definition in (a) was given in the celebrated paper Y.S. Chow ( 1966 1 which uses the 



term generalized Gaussian variables. The closely related concept of semi-Gaussian variables 



which requires symmetry of ^ - seems to go back to J. P. Kahane (1963) 



The class of all sub-Gaussian random variables living on a common probability space 
(f?,j2/,P) is denoted by Sub(f?). The .sub-Gaussian standard is the number 

r(r/) = inf{a > : a is feasible in ([t])}. 

The infimum is attained and hence is a minimum. Sub(i7) is a linear space, r is a norm 
on Sub(i7) if variables differing on a null-set only are identified. (Sub(J7),r) is a Banach 
space. It is important to note that Sub(f2) is strictly contained in all spaces Lq{S1), p > 1, 
the spaces of all centred variables with finite p*^ order absolute moments. 

Remark 1 The most prominent sub-Gaussians are centred Gaussian variables r] with stan- 
dard deviation a and T{r]) = a. For them inequality ^ is an equality with a = a. The 
specific characteristic of sub- Gaussian variables are tails lighter than those of Gaussians, 
as expressed in (b) of Theorem^ 

The following theorem is essential in the present context. 

Theorem 2 For each n G M, suppose that the variables s G *5'", are independent. Then 
(a) Suppose that there is a real number /? > such that for each n G M and real numbers 
Us , s d , and each c G R+, the estimate 



> c < 2 • exp 



(8) 
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holds. Then all variables ^" are sub-Gaussian with a common scale factor /3. 
(h) Let all variables ^" be sub-Gaussian. Suppose further that 

13 ^2- sup{T^{C) : n e M, s e 5"} < oo. (9) 

Then (a) is fulfilled with this factor /3. 

This is probably folklore. On the other hand, the proof is not straightforward and therefore 
we supply it in an Appendix. 

Remark 2 For white Gaussian noise one has t(^") = a and hence /3 = 2(t^. 
3.2 Noise Projections 

In this section, we quantify projections of noise. Choose for each ri e M a class C 2^" of 
admissible segments over S'" and a set *P" of admissible partitions. As previously, for each 
P S a linear function space is given. We shall denote orthogonal L^-projections 
onto the linear spaces = spanj^p : P G £P} by tt^. 

The following result provides L^-estimates for the projections of noise to these spaces, 
as there are more and more admissible segments. 

Proposition 1 Suppose that dim^p < D for all n G M and each P S Assume in 

addition that there is a number M > such that for some k > 

|^"| > M -n"^ eventually. 

Then for each C > (1/k + 1)PD and for almost all u Q 

||7r^„^"(w)||^ < C|^"| ln(|^"|) for eventually allneM and each ^" G 

This will be proven at a more abstract level. No structure of the finite sets S*" is required. 
Nevertheless, we adopt all definitions from Section [l] mutatis mutandis. All Euclidean 
spaces M.^ will be endowed with their natural inner products ( •, • ) and respective norms. 
Projections onto linear subspaces will be denoted by tt^. 

Theorem 3 Suppose that the noise variables ^" fulfill ^ accordingly. Consider finite 
nonempty collections io" of linear subspaces in M.^ and assume that the dimensions of all 
subspaces .Jif G 55", n G are uniformly bounded by some number D € N. Assume in 
addition that there is a number M > such that for some k > 

I Jo" I > M ■ eventually. 

Then for each C > (1/k + l)/3-D and for almost all u G Q 

< Cln(|Jo"|) for eventually all neM, and each ^ G i^" . 

Note that || • || is Euclidean norm in the spaces E"^", since each f"(w) is simply a vector. 
The assumption in the theorem can be reformulated as = 0{n^'^)\. 

Proof. Choose n G M and G io" with dimJ^ — dn. Let ct, 1 < i < dn he some 
orthonormal basis of . Observe that for any real number c > 0, 

X;i(rM,e.)P>c2ln|i3"| 
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implies that 



\{C{^),ei)\^ > for at least one i = l,...,d„. 

d„ 



We derive a series of inequalities: 

< v([j{\{C\e,)\'>f\r,m}] <£p(|(r,e,)|^>^ln|io"|) 



>c(ln|^"|/d„)i/' 



where the first inequality holds because of the introductory implication. By ^ we may 
continue with 

Therefore 



P(|k^ril' >c'ln|.^"|) <27^ <2I?^ 



A/ 

For C — > {1/k + 1)PD the negative exponent becomes larger than 1 and the sum 
becomes finite. Enumeration of each Sf^ and subsequent concatenation yields a sequence of 
events. The Borel-Cantelli lemma yields 

P(lk^ni > C^ln \S)n\ for finitely many (n, J^) with ^ G ij") = 1. 

This implies the assertion. □ 

Now let us now prove the desired result. 

Proof Proof of Proposition [l| We apply Theoremjsjto the collections io" = {■^r '■ R G 
^"}. Then = |^"|. Since for each ^" e qj" the spaces F G are mutually 
orthogonal, one has for z E M.^ that 

\\TT^nz\\^ = E II^^P^Il^ 

and hence for almost all w e i7 



Ik^'-flw)!!^ < ^ C • In |^"| = C • |^"| • In for eventually all n € M. 

This completes the proof. □ 
Let us finally illustrate the above concept in the classical case of gaussian white noise. 
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Remark 3 Continuing from Remark [2j we illustrate the behaviour of the lower bound for 
the constant C in Proposition [l] and Theorem [3] in the case of white gaussian noise and 
polynomially growing number of fragments, i.e. is asymptotically equivalent to n". In 
this case the estimate for the norm of noise projections takes the form 

|K5^-r(w)f < (^ + l^n2(j'^D\^'^\\nn={l + K)2(j^D\^'"\\nn, 

for almost each uj eventually. 

This underlines the dependency between the noise projections, the number of fragments, 
the noise variance, the dimension of the regression spaces and the size of the partitions. 



3.3 Discrete and Continuous Functionals 

We want to approximate functions / on the continuous domain S°° — [0, 1)'' by estimates 
on discrete finite grids S*". The connections between the two settings are provided by the 
maps i" and 5", introduced in Q and ([5|. Note first that 

= (x,y)/|S'"| and ||t"a;|p = ||a;||V|S'"| for y e M^" , (10) 

where the inner product and norm on the respective left hand sides are those on L?{S°°) 
and on the right hand sides one has the Euclidean inner product and norm. Furthermore, 
one needs appropriate versions of the functionals ([6]). Let now 6" be segmentation classes 
on the domains S*" and 6 D i"(3" a segmentation class on . Set 

I 00 otherwise. 

The two functionals are compatible. 

Proposition 2 Let n e M and i^"- , g g^^) £ and z" e K"^" . Then 

F^" (z" , ( ^" , ) ) = i/:; z" , ( ^" , ) ) . 
//, moreover, f G L'^{S°°) then 

(^",5§^„) e argmin77^((5"/,-) if and only g^,™) € argmin •) 



Proof. The identity is an immediate consequence of (10 1. Hence let us turn to the 
equivalence of minimal points. The key is a suitable decomposition of the functional 
•). The map l^S"' is the orthogonal projection of L^{S°°) onto the linear space 
Jff^ = span{l/.j:i<ij<„}, and for any {^,h) S (."6" the function h is in Jf". Hence 

11/ - + 7|^| = 11/ - ,"r/||2 + ||,'M"/ - h\\^ + 7|^|. 

The quantity ||/ — i"i5"/|P does not depend on {£P,h). Therefore a pair {^,h) minimises 

Wf-L-S'^ff + WL-y^f-hf+jl^l 

if and only if it minimises 

||i"(5"/ ~hf+ 7|^| = HJ^iL^'S'^f, h)). 
Setting z" = (5"/ in ([2|, this completes the proof. □ 
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3.4 Upper Bound for Projective Segmentation Classes 



We compute an upper bound for the estimation error in a special setting: Choose in advance 
a finite dimensional linear subspace ^ of L^{S°°). Discretization induces linear spaces 
6"^ = {5"/ : f G^} &nd^^ = {Ip ■ g : g e (5"^}, for any P C S'", of functions on S'". 
Let further for each n £ M, a set of admissible fragments and a family *P" of partitions 
with fragments in be given. Set := {"^^ : £^ £ The induced segmentation 

class 

6"(*p", C5") = {(^3", /) : ^ e / e ^^^} 

will be called projective (^-) segmentation class at stage n. 

The following inequality is at the heart of later arguments since it controls the distance 
between the discrete M-estimates and the 'true' signal. 

Lemma 1 Let for n G M a '^^ -projective segmentation class 6" over S"' be given and choose 
a signal f e L'^{S°°) and a vector ^" g E"^" . Let further 

(^",r)e argmin H;iS^f + CA^,h)) 

and {^,h) e 6". Then 

1 fi 

IWr -ff< 27(1^1 - l^"l) + 3\\i-h f\f + (||7r^„rir + ll^^ni') • (11) 



Proof. Since (^F", /") is a minimal point of H!^{S"f + £,",■)) the embedded segmentation 
/") is a minimal point of + ')) by Proposition [i] and hence 

7i^"i + IK^"/" - /) - i"rii' < 7i^i + - /) - ^"rir. 

Expansion of squares yields that 

7i^"i + iii"r - /II' + 2(i"r - /, + 
< 7i^i + - /IP + 2(t"/i - /, t"r> + ik'TiP 

and hence 

lit"/" - /IP < 7(1^1 - |^^"|) + ||i"/i - /IP + 2(i"/i - i"/", (12) 

By definition h e and /" € which implies that /i - /" e ^' = span(^",^^) 

and hence 7r^/(/" ~ h) — f" — h. We proceed with 

i(t"/.-t"/",t"r)i - i5"ri|(7r^'(/"-/i),r)i = i5"ri|(/i-r,^^,r)i 
< i5"r^/'ii7r,^'rii • Ik"/" - /II + i^"r'/'ik^'rii • 11/ - ^"/^ii- 

bmce we conclude 

- i"/", i"C")l < Hi"/" - i"/i||V4 + 11/ - t"/i||V4 + 2||7rjr'ril Vl'S"| 

< ii."r - i"/iiiV4 + 11/ - t"/iiiV4 + 4 (ii7r^„riP + ik^riP) /i^"i 
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Putting this into inequaUty (12 1 resuhs in 

Ik"/" - /IP < - i^"i) + Kh - ff + ~ nv^ + 11/ - ^"/^ii V2 
+ 8(ii7r^„rii' + ik^riP)/i^"i, 

which imphes the asserted inequahty. 



□ 



4 Consistency 

In this section we complete the abstract considerations and summarize the preliminary 
work in two theorems on consistency. The first one concerns the desired i^-convergence of 
estimates to the 'truth', and the second one provides convergence rates. 



4.1 L^-Convergence 



We will prove now that the estimates of images converge almost surely to the underlying true 
signal in L'^{S°°) for almost all observations. We adopt the projective setting introduced 
in Section [3.4[ Let us make some agreements in advance. 



Hypothesis 1 Assume that 
(1^1) there are k > and C > such that |^"| > C ■ n'^ eventually, 

(1^2) there is a real number /3 > such that, for each ri € M and real numbers /i^, s G 5'", 
and each c G R+, the inequality 



> c < 2 • exp 



holds. 



(1^3) the positive sequence (7„)„gN satisfies 



In |^"| 

7„ ^ and 7„ > C • — , , — , for eventually all n 

P"| 

with C ~ j3D{K + 1)/k, and D is, like in Proposition^ an upper bound for the 
dimension of the linear spaces . 

Remark. Note that the condition 7„ • |S'"|/lnn — >■ oo implies the second part of ( Ep]3) 
by (EQI). It was used for example in |F. Friedrich| ( |2005[ ) or |L. BoYSEN et al] ( |2009[ 



2007| ). 



Given a signal / G L'^{S°°) we must assure that our setting actually allows for good 
approximations of / at all. If so, least squares estimates are consistent. 



Theorem 4 Assume that Hypothesis^ holds. Let f E and suppose 



lim limsup inf - /|| = 0. 



(13) 



The 



lk"/"H-/ll 



as n oo for almost all uj £ f2. 
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We formulate part of the proof separately, since it will be needed later once more. 
Lemma 2 We maintain the assumptions of Theorem^ Then, given k > 0, 

||t"/"(a;) - /IP < 37„ • fc + 3||i"/i - f\\^for all (Q, h) £ 6" such that \Q\ < k (14) 
eventually for all n £ N and for almost all uj £ fi. 

Proof. Lemma [T] yields 

1 fi 

Ik"/" M - /IP < 27„ (1^1 - l^"l) + 3\W'h - /IP + ^ (Ik^.eiP + hMW) 

and application of Propositionjljimplies that for any real number C > ^^^j3D, the following 
inequality holds for almost all cj e 

||,"/"(c.) - /IP < 27„fc + 3||."/i - /IP + 16C' (^^^^^^ ■ [m + |^"|) - 27„ • |^"| 
< 27„fc + 3||."/i - /IP + leC'i^^fc + |^"| f 8C'i^^ - 27„' 



For 7„ satisfying Hypothesis (t|T]3), the term in parenthesis is negative. Therefore (14) 
holds and the assertion is proved. □ 

Theorem |4] follows now easily. 

Proof Proof of Theorem |4} The following formulae hold almost surely. Lemma [2] 
implies that, for 

Ik"/" - /IP < 37„ • fc + 3 • inf {\\L^h - /IP) eventually 

Therefore 

limsuplk"/" - /IP < limsup (37„ • fc + 3 • inf (|k"/i-/|P' 

\ (^,/i)e6'M^|<fc 

= + 3-limsup inf (|k"/i - /|p) 



By assumption (131, the right hand side converges to as fc tends to oo. Hence 

limsup|k"/"-/lP=0, 

n— ^oo 

which completes the proof. □ 
4.2 Convergence Rates 

The final abstract result provides almost sure convergence rates in the general setting. 



13 



Theorem 5 Suppose that Hypothesis^ holds and assume further that there are real numbers 
a, C > 0, g > 0, and a sequence {Fn)n<£N with lini„_^oo ^ri — oo such that 



■fcf 1 

for alln eM and k, and some h) e 6" with \^\ < k. 



Then 



for almost all u ^ Q. 



(15) 



(16) 



Proof. Let (fc„)„gM[ be a sequence in Recall from Lemma [2] that 

for sufficiently large n € M and any (=S, h) e ©" with < fc„ on a set of of full measure. 
The following arguments hold for all such w. We will write C for constants; hence the C 
below may differ. 



Since (a + h) < 2{a + 6 ), assumption (15) implies that 

Kr-f\?<c(-i^-h, + % + 



1 



(17) 



This decomposition of the error can be interpreted as follows: the first term corresponds to 
an estimate of the error due to the noise, the second term corresponds to the discretization 
while the third term can be directly related to the approximation error of the underlying 
scheme, in the continuous domain. 

One has free choice of the parameters k^ ■ We enforce the same decay rate for the first 



and third term setting 7„fc„ = Then, in view of (17) 



In 



2e 

2q + 1 



p2 



(18) 



To get the same rate for the discretisation and the approximation error set 

kl<^ 1 



or equivalently fc„ = Fn 



which, together with estimate (17), yields 



KF - ff <c\^nF^^- +F 

Straightforward calculation gives 



2e 

' 2a + l 



In*' > -^^^ if and only if 7„F„°+- > 



1 

77' a + e 



(19) 



Hence, the first term on the right hand side of inequality ( 18 ) dominates the second one if 



and only this holds in inequality (19). We discriminate between the two cases > and <. 
The first one is 



In 



> 



7r 



2e 

' 2a, + l 



i?2 



(20) 
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Combination with ( 18 1 results in 



||,"/"-/ll^<C-7«° 



(21) 



for some C > 0. In view of the equivalence, replacement of > by < in (201, results in 

1 _ 2a 



which, together with estimate (19), gives for some C > that 



Combination of (22) and (21 1 completes the proof of (16) 



(22) 
□ 



Remark 4 Let us continue from Remarkjs] If |^"| ~ n'^ and noise is white Gaussian with 
/3 — 2(7^ then Hypothesis (E(i]3) boils down to 



and 7„ > 2(k + l)^^!? 



In n 



Setting En — ajn'^l'^^ the estimate (16) then reads 



||."rH-/lP = 0((4|lne„|)^°°^ 

as long as the growth of F„ is sufficient. This is strongly connected with the optimal 
minimax rates from model selection, which bound the expectations of the left hand side, 



see for instance L. BiRGE and P. Massart (1997) 



5 Special Segmentations 

We are going now to exemplify the abstract Theorem [5] by way of typical partitions and 
spaces of functions. On the one hand, this extends a couple of already existing results and, 
on the other hand, it illustrates the wide range of possible applications. 



5.1 One Dimensional Signals - Interval Partitions 

Albeit focus of this paper is on two or more dimensions, we start with one dimension. 
There are at least two reasons for that: illustration of the abstract results by choices of 
the (seemingly) most elementary example, and to generalize results like some of those in 



L. BoYSEN et al. (2009 2007) to classes of piecewise Sobolev functions. 

To be definite, let 5" = {1, . . . , n} and let = {[«, j] : 1 < i < j < n} be the discrete 
intervals of admissible fragments. Then is the collection of partitions of S*" into intervals. 
Plainly, |^"| = (n + l)n/2 and = 2""-'^. We deal with approximation by polynomials. 
To this end and in accordance with Section |3.4[ we choose the finite dimensional linear 
subspace C i^([0, 1)) of polynomials of maximal degree p. The induced segmentation 
classes 6"(*P",5^") consist of piecewise polynomial functions relative to partitions in CP". 

The signals to be estimated will be members of the fractional Sobolev space VF"'^((0, 1)) 



of order a > 0. The main task is to verify Condition ( 15 ). Note that this class of functions 



15 



is slightly larger than the classical Holder spaces of order a usually treated. For results 
in the case of equidistant partitioning, we refer, for instance, to L. Gyorfi et al. (2002) 
Section 11.2. 

For the following lemma, we adopt classical arguments from approximation theory. 

Lemma 3 For any f e iy"'^((0, 1)), with p < a < p + 1, there is C > such that for all 
k <n £N, there is (^^, /iJJ, ) G 6", such that < k and which satisfies 

||/-^"/^fc||<C.f^ + ^') (23) 



n 



For the proof, let us introduce partitions = {[(« — 1)/^, i/k) : z = 1, • • • , fc} of [0, 1) 
into k intervals, each of length 1/k. 

Proof. Let / e W^"'^((0, 1)). From classical approximation theory (see e.g. [T3], Chapter 
12, Thm. 2.4), we learn that there is C > such that there is a piecewise polynomial 
function of degree at most p such that 

For each i = 1, . . . , A:, let hk.i denote the restriction olhk to = {{i~l)/k,i/k). We consult 
the Bramble- Hilbert lemma (for a version corresponding to our needs, we refer to Thm. 6.1 
in [T7]) and find C > 0, such that 

1/ - hk,i\w^.^iii) < C ■ for each i = 1, . . . 

This yields for some C > 0, independent of k and n, that 

|/ife,i|vKi'2(7,) < 1/ - + < C ■ for alH = 1, • • • ,k. 

We turn now to the piecewise constant approximation on the partition We split [0, 1) 
into the union JJ^ of those intervals in ^„ which do not contain knots i/k and the union 
KJ^ of those intervals in ^„ which do contain knots i/k. For I £ J^^ and / C JJ}, we have 

\hk,i\w^.Hi) < C|/Im/i.2(/)| if and only if |/ifc,,|i2(j) < • |/'li2(j). 
This implies 

X! \^k,i\h{i) < C'^ X! l/'li^C/) - C'^l/'U=([o,i])' 
which in turn leads to 

|/ifc|vKi.2(jfc) < C'^|/|W1.2((0,1))- 

Hence we are ready to conclude that for some constant C > 0, 

\\hk-i''S''hk\\mj^)<C/n. (24) 
For / e A and / C K]^, we use the fact that /i^ < 2C • ||/||l=o([0, 1]) and deduce 

||/ifc-t"<5"/ifc|U2(,) <2C||/|Uoo(,)/n. 
Summation over all intervals included in KJ} results in 

Whk-i'^S'^hkWmKi^) <C-k/n. 
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This yields for the entire interval [0, 1) that 

11/ - t"<5"/ife|| <\\f-hk\\ + \\hk- L-S'^hkW < C f ^ + 

With — S"hk, this completes the proof. □ 

Piecewise smooth functions have only a very low Sobolev regularity. Indeed, recall 
that piecewise smooth functions belong to VF"'^((0,1)) only for a > 1/2. In order to 
overcome this limitation, we consider a larger class of functions, the class of piecewise 
Sobolev functions. 



Definition 1 Let a > 1/2 be a real number, J € N, and Xq — < Xi < ■ ■ ■ < xj+i = 1. 
A function f is said to be piecewise VF"'^([0, 1]) with J jumps, relative to the partition 
{[xi,Xi+i) : i = 1, • • • , J} if 



Remark. Definition [T] is consistent, due to the Sobolev embedding theorem. For an 
open interval / of M, W"''^{I) is continuously embedded into {I""), the space of uniformly 
continuous functions on the closure 1°" of /. 

We conclude from Lemma [H 

Lemma 4 Let f be piecewise-W°'''^{[0, 1)) with J jumps and with p < a < p + 1. Then 
there are C > and i^k,hl) G such that \^"\ < k and 

ii/-^iNi<c.(i, + ^ + ;J). (25) 



Proof. With the same arguments as in the previous proof we just have to include the 
error made at each jump of the original piecewise regular function. More precisely, we use 
a similar splitting into JJ! and KJ^ where KJ^ also contains the intervals containing Xi for 
i = 1, • • • , J. Since there are at most k + J intervals in KJ^, this gives estimate (25 1. □ 



By Lemma |4] a piecewise Sobolev function satisfies Condition ( 15 1 with p — I and 
Fn = n and therefore Theorem [5] applies. In summary 

Theorem 6 Let f be a piecewise VK"'^([0, 1]) function, such that < a < p + 1, where p 
is the maximal degree of the approximating polynomials. We assume further that (1^3) 
holds and that the noise variables ^" from Section 2.1 satisfy Then 



KPiuj) - ff ^ O 7^"+' , for almost all w £ 12. 



(26) 



Proof. Let us check the assumption in Theorem [Sj Since — {n — l)n/2, Hypothesis 
(1^1) holds with K — 2. Hypothesis (1^2) and (1^3) were required separately. Finally, 

I by Lemma |4] Finally, Hypothesis ('^{I}^^ 

□ 



Condition (151 holds with g = 1 and Fn 



completes the proof. 
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Let ^-^([0,1]) denote the set of continuously difFerentiable functions. For p € N, a S 
{p,p + 1], a function / S "^^([0, 1]) is said to be a-Holder if there is C > such that 

\fP\x)-f^P\y)\<C\x-yr-P for any x,y G [0,l],x ^ y. 

The hnear space of a-H61der functions will be denoted by "^"([0, 1]) if a G N and '^"-^'^([0, 1]) 
if a G N. 

Remark. Choose 7„ = Clnn/n with large enough C, independently of /. Then the 
almost sure estimates ( 26 ) of the estimation error simplifies to 

\\lT{uj)- f\\^ for almost all a; G 12. (27) 



These convergence rates are, up to the logarithmic factor, the optimal rates for mean 
square error in the Holder classes "^"([0, 1]). Thus, our estimate adapts automatically to 
the smoothness of the signal. 



5.2 Wedgelet Partitions 



Wedgelet decompositions are content-adapted partitioning methods based on elementary 
geometric atoms, called wedgelets. A wedge results from the splitting of a square into two 
pieces by a straight line and in our setting a wedgelet will be a piecewise polynomial function 
over a wedge partition. The discrete setting requires a careful treatment. We adopt the 
discretization scheme from F. Friedrich et al. (2007), which relies on the digitalization 



of lines from J. Bresenham (1965). This discretization differs from that in D. DoNOHO 



( 1999 ), where all pairs of pixels on the boundary of a discrete square are used as endpoints 
of line segments. One of the main reasons for our special choice is an efficient algorithm 
which returns exact solutio ns of the functional ([6| . It relies on rapid moment computation, 
based on lookup tables, cf. IF. Friedrich et all (|2007|). 



Wedgelet partitions 

Let us first recall the relevant concepts and definitions. Only the case of dyadic wedgelet 
partitions will be discussed. Generalisations are straightforward but technical. 

We start from discrete dyadic squares S"^ = {1, . . . , m}^ with m G M = {2^ : p G No}. 
Admissible fragments are dyadic squares of the form 

[(^-l)•2^^•2«)x [(j-l).2«,j.2«), 1 < z, j < 2^-", < g < p. 

The collection of dyadic squares can be interpreted as the set of leaves of a quadtree where 
each internal node has exactly four children obtained by subdividing one square into four. 
Digital lines in are defined for angles i? G (— 7r/4, 37r/4]. Let 

d(,9)=max{|cos^|,|sin^|}, vi^) ^ { ^-''f^'^'^^J, if | cos^| > | sin^l 
^ ' ^' ' ' ^ ' [ (smiy, — cos w) otherwise 

The digital line through the origin in direction is defined as 

Ll = {seZ^ : -d{§)/2 < {s,v{d)) < d{'d)/2}. 
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Lines parallel to L° are shifted versions 

£S = {.s e Z2 : (r - l/2)d{d) < {s, v{^)) < {r + l/2)d{^)} 

with the line numbers r G Z. One distinguishes between flat lines where cosi? > sin?? and 
steep lines where cosi? < sind. For a: S E, set round(x) — max{i G Z : i < x + 1/2}, 
let y-g(x) = round(a; • tan??) and xs{x) = round(?/ • cot??). According to Lemma 2.7 in 



F. Friedrich et al. (2007), 



= (0, r) + {(x, y^(a;) : X e Z)} for flat lines, 
— (i^i 0) + {{x^{y),y : y G Z)} for steep lines. 

By Lemma 2.8 in the same reference, all parallel lines partition Z^. We are now ready to 
define wedgelets. Let Q be a square in Z^ and * ^^^^ with H Q 7^ and L^'^^ n Q 7^ 0. 
A wedge split is a partition of Q into the lower and upper wedge, respectively, given by 

w^/=\jL'^nQ, M^r^y^nQ. (28) 

k<r k>r 

Let ^ be a partition of some domain S"" into squares. Then a wedge partition of S*™ 
is obtained replacing some of these squares by the two wedges of a wedge split. It is called 
dyadic if to G M, and the squares Q G =S are dyadic. 

We assume that a finite set of angles is given. The set of admissible segments 



consists of wedges obtained by wedge splits of dyadic squares, given by ( 28 ) and for 9 £ 0, 
or by dyadic squares. 

Focus is on piecewise polynomial approximation of low order. The induced segmentation 
classes 6"* consist of piecewise polynomial functions relative to a wedgelet partition. The 
cases of piecewise constant (original wedgelets) and piecewise linear polynomials (platelets) 
will be treated explicitly. 

Wedgelets and approximations 



We first recall some approximation results for wedgelets. They stem from D. DoNOHO 



(1999) and R. Willett and R. Nowak (2003). Since we are not working with the 



same discretisation we rewrite them for the continuous setting and provide elementary 
self-contained proofs. The discussion of the discretisation is postponed to Section [5?2} We 
start with the definition of horizon functions, like in D. DoNOHO| ( T999| . 



Definition 2 (Horizon functions) Leta G (1,2] andhe "^^"([0,1]) ifa < 2 or '^^^^([0, 1]) 
if a = 2. Let further f he a bivariate function which is piecewise constant relative to the 
partition of [0, 1]^ in an upper and a lower part induced by h: 



fix,y) 



ci ify< h{x), 
C2 ify> h{x), 



with real numbers ci and C2. Such a function is called an a-horizon function; the set of 
such functions will be denoted by iJor"([G, 1]^). h is called the horizon boundary of f . 

Discretisation at various levels of a typical horizon function is plotted Fig.[2j left column. 
In the right column the respective noisy versions are shown. 
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Figure 2: Left: 5"/, for n ~ 64, 128, 256, respectively, where / is a horizon function, 
according to Definition [2j Here, the horizon boundary is in "^"((0, 1)) and a — 1.5. Right: 
Respective noisy images S"f + 
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Lemma 5 Let a G [1,2] and f G iJor"([0, 1]^) with boundary function h. Then there are 
C, C" > - independent of k - and for each k a continuous wedge partition Wk of the unit 
square [0,1]^, such that |#fe| < C'k and 

C 

ll/-/felU^([oa]2) < 

where fk is the -projection of f on the space of piecewise constant functions relative to 
the wedge partition 

Proof. Let us first approximate the graph of h by linear pieces. We consider the uniform 
partition induced by Xi — i/k. We denote by Sk{h) the continuous linear spline interpolating 
h relatively to the uniform subdivision: 



Skih){x) = h{xt) + (x - Xt) 



h{xi+i) - h(xi) 



Xi 

where li — [xi,XiJ^i\. Therefore, we have 



for i = Q, . . . ,k — \ and x G li 



\h{x) ~ Skmx)\ - 



Hx)-h[x^)-^-^'^^±^l^^ix-x.) 



Since h' G "^"'"-^[0, 1]), there exists C > such that 
h{xi+i) - h{xi) 



•^i-fl Xi 



- h'{xi) 



< C\x^+i - Xi\ 



a-l 



for each x £ L. 



C 



(29) 



This implies that 

\h{x) - Sk{h){x)\ 
On the other hand, 



h{x) - h{xi) - h\x,) + O 



k°'- 



(x - Xi) 



for X € L. 



h{x) — h{xi) + h'(xi){x — Xi) + 0{\x — Xi\"). 



Hence, Equation ( 29 ) can be rewritten as 

\h{x) - Sk{h){x)\ = 0{\x - x,n + O 

and there is a constant C > (independent of k) such that 

C 



1^- '5'fe(/i)||ioo([o,i]) < 



Now we will use this estimate to derive error bounds for the optimal wedge representation. 
As a piecewise approximation of / wc propose 



fkix,y) 



ci if y < Sk{h){x); 
C2 if ?/ > Sk{h){x). 
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We bound the error by the area between the horizon h and its piecewise affine reconstruction: 

11/ - /fc||L2([oa]2) < |ci - C2I (^J \h{x) ~ Sk{h){x)\ dx^ 

C 



< |ci - C2I {\\h- Sk{h)\\L'^(^[o.i])) 



1/2 



< 



fca/2- 



It remains to bound the size of the minimal continuous wedgelet partition Wk, such that 
fk e ^-Tfe- A proof is given in Lemma 4.3 in|D. DoNOHo| (|1999[); it uses h e '^^{[0, 1]). □ 



Remark. For an arbitrary horizon function, the approximation rates obtained by non- 
linear wavelet approximation (with sufficiently smooth wavelets) can not be better than 



11/ ~ /fc||L2([0,l]2 



O 



1 



fcl/2 



where fk is the non-linear fc-term wavelet approximation of /. This means that for such a 
function the asymptotical behaviour in terms of approximation rates is strictly better for 
wedgelet decompositions than for wavelet decompositions. For a discussion on this topic, 



see Section 1.3 in E. Candes and D. DoNOHO (2002). 



Piecewise constant wedgelet representations are limited by the degree of the regression 
polynomials on each wedge. This is reflected by the choice of the horizon functions which 
are not only piecewise smooth but even piecewise constant. A similar phenomenon arises 
also in the case of approximation by Haar wavelets. 



R. WiLLETT and R. Nowak (2003) extended the regression model to piecewise linear 



functions on each leaf of the wedgelet partition and called the according representations 
platelets. This allows for an improved approximation rate for larger classes of piecewise 
smooth functions. 

Let ft, be a function in "^([0, 1]). We define the two subdomains and , respectively, 
as the hypograph and the epigraph of h restricted to (0, 1)^. In other words: 



S+ = {{x,y) e (0,1)2 I y > ^ g- ^ K^^y) ^ lf\y< h{x)} . 

Let us introduce the following generalised class of horizon functions: 

HormO, I?) ■■= {/ : [0, 1]' ^ M| and f\s- G <^"(5±), h e -^"([0, 1])}. 



(30) 



(31) 



The following result from R. Willett and R. NoWAK ( |2003 ) gives approximation rates 
by platelet approximations for Hor". 

Proposition 3 Let f e Hor^{[0, 1]) for 1 < a < 2. Then the k-term platelet approxima- 
tion error hk satisfies 

\\f-hk\\LW)=o(^^y (32) 
Proof. A sketch of the proof is given by the following two steps: (1) the boundary between 



the two areas is approximated uniformly like in D. DoNOHO (1999); (2) in the rest of the 



areas we use also uniform approximation with dyadic cubes, together with the corresponding 
Holder bounds. The partition generated consists of squares of sidelength at least 0{l/k^/^). 
There are at most 0{k) such areas. □ 
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Wedgelets and consistency 

Now we apply the continuous approximation results to the consistency problem of the 
wedgelet estimator based on the above discretization. Note that, due to our specific dis- 
cretization, the arguments below differ from those in D. DoNOHO (1999). 

Two ingredients are needed: pass over to a suitable discretisation and bound the number 
of generated discrete wedgelet partitions polynomially in n, in order to apply the general 
consistency results. Let us first state a discrete approximation lemma: 

Lemma 6 Let f be an a horizon function in i?orf with 1 < a < 2. There is C > such 
that for all k < n ^ N , there is {^J^, h^,) ^ such that \^J!\ < k and which satisfies 

Il/-^"'^l-ll^^-(fci7^ + ^)- (33) 
Proof. The triangular inequality yields the following decomposition of the error 

||/-i"5"/lfe|| < \\f-hk\\ + \\ht,~L''6nhk\\. 



The first term may be approximated by ( 32 ) , whereas the second term corresponds to the 
discretisation. Let us estimate the error induced by discretisation. 

One just has to split [0, 1)^ into J^, the union of those squares in =S„ which do not 
intersect the approximating wedge lines and the union of such squares meeting the 
approximating wedge lines. We obtain the following estimates: 



C 

— '■"<5"/ifc|lL2(Q-| < for any Q G K'^, and for some constant C > 0. 



Since there are at most C'kn such squares, for some constant C" not depending on k and 
n, this implies that 

\\hk - i^5-hk\?mK-^ < ^ = - and - I'^S^KWl^j^. < — , 

where C > is a constant. Taking ft,)! ~ (5"ftfe completes the proof. □ 

Finally, the following lemma provides an estimate of the number of fragments in i^". 

Lemma 7 There is a constant C > such that for all n €z M. the number \^"\ of fragments 
used to form the wedgelet partitions is bounded as follows: 

Proof. In a dyadic square of size j, there are at most possible digital lines. For dyadic 
n e M one can write n = 2"^ and therefore we have 

J J 22J+2 _ 1 

\^"\<Y^ 22("'-*) • 2^-^' ^ri^Yl 2^'' = ■ ^2 j— <C-n^ for some constant C > 0. 

This completes the proof. □ 

Note that the discretisation of the continuous approximation hk leads to a wedgelet 
partition composed of fragments in Therefore, combination of the Lemmata [7] and [6] 
yields: 
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Theorem 7 On S'^ — {1, ... , n}^ the following holds: Let a such that 1 < a < 2 and f be 
an a horizon function in Horf{[0, 1]^) with 1 < a < 2 and suppose that 7„ satisfy (1^3). 
Assume further that the noise variables ^" from Section 2.1 satisfy The 



len 



ll/", - = O (t""') + O {n-^) , for almost all w G 12, (34) 
where f^^ is the wedgelet-platelet estimator. 



Remark. Choosing 7„ of the order Inn/n , estimate (34 1 reads 



- ff = O ^ ^^'""ir^ j + O for ahnost ah uj e f2. (35) 

Whereas the left term on the right-hand size consists of the best compromise between 
approximation and noise removal, the right term on the right-hand size corresponds to the 
discretisation error. Note that, in contrast to the ID-case the discretisation error dominates, 
as soon as a > 1. This is related to the piecewise constant nature of our discretisation 
operators. In concrete applications, this may prove to be a severe limitation to the actual 



quality of the estimation. Up to this discretisation problem, the decay rates given by ( 35 ) 
are the usual optimal rates for the function class under consideration. 

On the left column of Fig. [3j wedgelet estimators for a typical noisy horizon function 
are shown. 

5.3 Triangulations 

Adaptive triangulations have been used since the emergence of early finite element meth- 
ods to approximate solutions of elliptic differential equations. They have been also used 



in the context of image approximation; we refer to L. Demaret and A. ISKE (2010) for 
an account on recent triangulation methods applied to image approximation. The idea to 
use discrete triangulations leading to partitions based on a polynomially growing number of 



triangles has been proposed in E. C ANDES (2005) in the context of piecewise constant func 



tions over triangulations. In the present example, we deal with a different approximation 
scheme, where the triangulations are Delaunay triangulations and were the approximating 
functions are continuous linear splines. One key ingredient is the use of recent approxima- 



tion results, L. Demaret and A. Iske (2012), that show the asymptotical optimality of 



approximations based on Delaunay triangulations having at most n vertices. Due to this 
specific approximation context, a central ingredient for the proof of the consistency is a 
suitable discretization scheme, which still preserves the approximation property. 

Continuous and discrete triangulations 

Let us start with some definitions. We begin with triangulations in the continuous settings: 



Definition 3 A conforming triangulation T of the domain [0, 1]^ is a finite set {T}TeT of 
closed triangles T <Z [0, 1]^ satisfying the following conditions. 



(i) The union of the triangles in T covers the domain [0, 1]^; 
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Figure 3: Estimators of the noisy images of Fig[2] Left: piecewise linear wedgelet estimator. 
Right: piecewise linear and continuous Delaunay estimators. 

(a) for each pair T, T' G T of distinct triangles, the intersection of their interior is empty; 
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(in) any pair of two distinct triangles in T intersects at most in one common vertex or 
along one common 



We denote the set of (conforming) triangulations by £'^{[0,1]'^). We will use the term 
triangulations for conforming triangulations. 

Accordingly we define the following discrete sets, relatively to partitions — {[(* — 
l)/k,i/k) X [(j — l)/k,j/k) : i,j — 1, • • • , fc} of [0,1)^ into k squares each of side length 
1/fc. 

For a, 6 e [0, 1]^ we denote by [a, b] the line segment with endpoints a and b. 

Definition 4 For a triangle T C [0,1]^, with vertices a,b and c, we define the following 
discrete sets: 

(i) for each p £ {a, 6, c} the square Q G cS„ such that Q 3 p is called a discrete vertex of 
T; 

(a) for each edge e G {[a&], [bc\, [ca]}, the set of squares Q G =S„ such that Q H e ^ $ and 
Q is not a discrete vertex is called a discrete (open) edge of the triangle T; 

(Hi) the set of squares Q G ^„ such that QC\T ^ % and Q is neither a discrete vertex nor 
belongs to a discrete open edge is called a discrete open triangle. 

Piecewise polynomials functions on triangulations 

We take S'" = {1, • • • ,?t.}^ and the set of fragments is given as the set of discrete 
vertices, open edges and open triangles 

^" = 5" U {{[ab]) ; a, 6 G 5"} U {{[abc]) ■.a,b,c€ S"-} . 

We let then be the collection of partitions of S*" into discrete triangles, obtained from 
a continuous triangulations, and assuming that there is a rule deciding to which triangle 
discrete open segments and discrete vertices belong. Each such discrete triangle is then the 
union of elementary digital sets in We remark that |^"| = n + n{n — l)/2 + n{n ~ 
l)(n — 2)/6 and therefore |^"| ~ /6. Like in the one-dimensional described in 

Section 5.1 
of maxima 

polynomial functions relative to partitions in 
We have the following approximation lemma 



we choose the finite dimensional linear subspace C L'^{[0, 1)) of polynomials 
degree p. The induced segmentation classes (3"(CP",5^") consist of piecewise 



Lemma 8 Let f e "^"([0,1]^), with p < a < p + I. There is C > such that for all 
k <n eN, there is (^^, /ij!) G 6", such that \^^\ < k and which satisfies 



||/-^"/i^ll<C.(^^+(-) ). (36) 

Proof. We first use classical aproximation theory which tells us the existence of a function 
hk ■ [0,1]^ I— >■ K, piecewise polynomial relatively to a triangulation with k triangles and 
such that the error on the whole domain is bounded by 

\\f~h,\\< ^ 
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As in the 1-D case we spht [0, 1)^ into the union of those squares in ^„ which do not meet 
the continuous triangulation, and K!^ the set of such squares meeting the triangulation, i.e. 
which intersects with some edge of the triangulation. For each small square Q G J2n and 
Q C K^^, the following estimate holds: 



c 



for any Q G K^, and some constant C > 
and there are at most 3 • \/2kn such squares. Altogether we obtain: 



\\hk — t"5"/ife||2,2(^fcj 



< 



,1/2 



for some constant C > 0. 



Now for each square Q e =S„ and Q C J^, an argumentation similar to that in the ID-proof 
yields 

\\hk-i''S"hk\\LHJz) < -■ 



This completes the proof. 



□ 



Due to Lemma [s] ([Ts]) is satisfied: a function in 'ra" satisfies (15) with p — 1/2 and 
Fn = ri}!"^ and therefore Theorem [H] applies. 

Continuous linear splines 

We turn now to the more subtle case of continuous linear splines on Delaunay triangulations. 
Anisotropic Delaunay triangulations have been recently applied successfully to the design of 



a full image compression/decompression scheme, L. Demaret et al. (20061, L. Demaret 



et al. (2009). We apply such triangulation schemes in the context of image estimation. 



We first introduce the associated function space in the continuous setting. We restrict 
the discussion to the case of piecewise affine functions, i.e. p = 1. 

Definition 5 Let T be a conforming triangulation of [0, 1]'^ . Let 

5^- {/e<r([0,i]2) :/|^e^i,Ter}, 

be the set of piecewise affine and continuous functions on T. 



The following piecewise smooth fimctions generalise the horizon functions from (31). 

Definition 6 Let a € (1, 2) and g e "^"([0, 1]). Let and S~ be two subdomains defined 
as in (30). A generalised a-horizon function is an element of the set 

where VF"'^(S'*) is the Sobolev space of regularity a relative to the L^-norm on . 

In order to obtain convergence rates of the triangulation-based estimators for this class 
of functions we need the following recent result, Thm.4 in L. Demaret and A. Iske (2012 ): 
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Theorem 8 Let f be an a-horizon function in Horf, with a € (1,2), such that f\s± S 
W"^'^{S^). Then there is C > 0, such that for all k e N there is a Delaunay triangulation 
Vk with 

C 

\\f - t^siJ\\l^{[o,iY) < 
Using arguments as in the proof of Lemma [H] we obtain the following lemma: 

Lemma Q Let f € ^^"'^([0, 1]^), with 1 < a < 2 there is C > such that for all k < 
n e N, there is {^J^, ^k)j such that € *P" is a discretisation of a continuous Delaunay 
triangulation T>k, | < fc, /ifc = S"hk, where hk G 5^ and which satisfies 



\f-i"hl\\<C^ 



fcl/2 



The previous machinery cannot be applied directly without an explanation: since we are 
dealing with the space of continuous linear splines, our scheme is not properly a projective 
^-segmentation class. However, for each fixed partition, € *p with elements in 
Sj- a subspace of . Observe that all arguments in Lemma [l] remain valid if we replace 
by subspaces and consider also the minimisation of the functional H" over functions 
in these subspaces. We can therefore apply Theorem [5] to obtain the equivalent of Theorem 

13 

Theorem 9 Let 1 < a < 2 and let f be a generalised horizon function in ^"([0, 1]^). Let 
further assume that noise in ^ satisfies Q and that 7„ satisfy (1^3). Then 

ll/"„ - /IP = O (7^) + O (n-*r) for almost all oj e f2, (37) 
where /"^ is the Delaunay estimator. 

Proof. We check the assumptions in Theorem Jsj Since |^"| is of the order {n"^)^, Hy- 
pothesis (1^1) holds with K = 3. Hypothesis (1^2) and (]^3) were required separately. 
Finally, (|l5|"^holds with g = 1/2 and F„ = n^^^ by Lemma ^ This completes the proof. □ 



Remark. Similarly to Remark 5.2 and choosing 7„ of the order Inn/n^, estimate (37 1 
reads 

ll/"„ - /IP = O i ^^^"^l^" J + o (y-^^ for almost all u e n. 



The discussion of Remark |5.2| can be easily adapted to the case of estimation by triangula- 
tions. 

On the right column of Fig. |3j estimators by Delaunay triangulation are shown, for the 
same noisy horizon function as in the wedgelet case. 

The rat es in Theorem [Q] are, up to a logarithmic factor, similar to the minimax rates 



obtained in E. Candes and D. DoNOHO (2002) with curvelets for a = 2 and more recently 
in C. Dossal et al. (2011) with bandelets for general a. This is in contrast to isotropic 
approximation methods, e.g. shrinkage of tensor product wavelet coefficients, which only 
attain the rate for a = 1. 
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6 Appendix 

We are going now to supply the proof of Theorem [2] 

Proof Proof of Theorem [2j Suppose that (b) holds. Theorem 1.5 in [6] gives 



ses„ 



> c < 2 • exp 



T is a norm and therefore t^(^s^") = A*sT^(^"). Because of (|9|, the inequality ^ holds 
and hencef'a^. Part (b) follows from the following two Lemmata 10 and 11 In fact, for 
s G S"" and ^s' — ^s.s' the inequality ([s]) boils down to P(|Cs| > c) < 2 exp(— c-^//3) and the 
lemmata apply. □ 

The missing lemmata read: 
Lemma 10 Let ^ be a random variable with P(|^| > c) < C • exp(— c^//3), /? > 0. Then 
E(exp(t^2)) < I ^ Ct/{I3-^ - t) whenever \t\ < 1//3. 

Proof. Let g be the distribution of |^|. With b ~ 1/(3 one computes 

= /J" 2i2/e*^>(|C| > 2/) dy < 2Ct/„°° ye^*-")^' dy = - t) if K| < 6. 

The proof is complete. □ 



Lemma 11 Let a > Q, 5 > 1. Then there is j3' G U {oo} such that for all centred 
random variables ^ with E(exp(Q;^^)) < S the estimate E(exp(/;^)) < exp(i^//?') holds for 
every t gR. 

A converse holds as well. 

Proof. Assume without loss of generality that a — 1. Let us first consider the case 
1^1 > 21n^/^ 5. Since - t/2)'^ > one has exp(t^) < exp(i2/4) exp(^2) ^ake expectations 
on both sides and use the assumption to get E(exp(i^)) < (5exp(i^/4). This implies 

E(exp(t^)) < exp(t'^/2) whenever \t\ > 2\/in^. 

Note that this estimate does not depend on the special variable ^. 

Let now \t\ < 2(ln(5)^/^. The function tp{t) = lnE(exp(t^)) is convex and hence ^''it) > 0; 
furthermore 

^(0) = and ip'{0) = E(0 = 0. (38) 
By the mean value theorem there is some £ [0, 1] such that 

ip{t) = </j(0) + V(0) + {t^/2)ip"{d{t)t) < (<V2) max{¥j"(t) : |<| < 2Vh^}. (39) 

Hence 1/ max(max{(^"(i) : \t\ < 2\n^^'^S},l) is a suitable scale factor for the ^ in question. 
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We must finally remove the dependency on moments of ^ in 

^"{t) = {E{e exp(<e))E(exp(iO) - E'{^exp{tO)) /E^ieMtO) ■ (40) 
To this end let rj be an independent copy of ^. Then the denominator becomes 

D = E{e exp(t(e + ry)) - E(eryexp(t(e + v))) = E((e ~ exp(i(e + v)))- 
With (a - 6)2 < (a - 6)^ + (a + 6)^ = 2{a'^ + 6^) we arrive at 

D < 2E{^^ exp{t^))E{exp{tO). 



By convexity of (p and (38 1 one has ip > and thus E(exp(i^)) > 1. Furthermore, < 
2exp(^2). In view of the restriction on t, the Cauchy-Schwartz inequality gives 

exp(te)) < E(e'')E(exp(2tO) < 2 • E'^{^^) exp{f) < 2 ■ 6^ ■ ^ 25^. 

By Jensen's inequality E(exp(t^)) > exp(tE(^)) = exp(t • 0) = 1. Hence 

D < 2^/^S^E{exp{tS_)) < 2^/^S^E^{exp{t£_)). 



Canceling out the numerator in (401 yields ma,x{ip"{t) : \t\ < 2(ln((5))^/2} < 2'^^'^5^ which 



completes the proof. □ 
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